Thread and Memory Placement on NUMA Systems: Asymmetry Matters

نویسندگان

Baptiste Lepers

Vivien Quéma

Alexandra Fedorova

چکیده

It is well known that the placement of threads and memory plays a crucial role for performance on NUMA (Non-Uniform Memory-Access) systems. The conventional wisdom is to place threads close to their memory, to collocate on the same node threads that share data, and to segregate on different nodes threads that compete for memory bandwidth or cache resources. While many studies addressed thread and data placement, none of them considered a crucial property of modern NUMA systems that is likely to prevail in the future: asymmetric interconnect. When the nodes are connected by links of different bandwidth, we must consider not only whether the threads and data are placed on the same or different nodes, but how these nodes are connected. We study the effects of asymmetry on a widely available x86 system and find that performance can vary by more than 2× under the same distribution of thread and data across the nodes but different inter-node connectivity. The key new insight is that the best-performing connectivity is the one with the greatest total bandwidth as opposed to the smallest number of hops. Based on our findings we designed and implemented a dynamic thread and memory placement algorithm in Linux that delivers similar or better performance than the best static placement and up to 218% better performance than when the placement is chosen randomly.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport

Modern shared memory multiprocessor systems commonly have non-uniform memory access (NUMA) with asymmetric memory bandwidth and latency characteristics. Operating systems now provide application programmer interfaces allowing the user to perform specific thread and memory placement. To date, however, there have been relatively few detailed assessments of the importance of memory/thread placemen...

متن کامل

Performance Models for Electronic Structure Methods on Modern Computer Architectures

Electronic structure codes are computationally intensive scientific applications used to probe and elucidate chemical processes at an atomic level. Maximizing the performance of these applications on any given hardware platform is vital in order to facilitate larger and more accurate computations. An important part of this endeavor is the development of protocols for measuring performance, and ...

متن کامل

Memory Latency Reduction with Fine-grain Migrating Threads in Numa Shared-memory Multiprocessors

In order to fully realize the potential performance benefits of large-scale NUMA shared memory multiprocessors, efficient techniques to reduce/tolerate long memory access latencies in such systems are to be developed. This paper discusses the concept, software and hardware support for memory latency reduction through fine-grain non-transparent thread migration, referred to as mobile multithread...

متن کامل

NUMA-aware algorithms: the case of data shuffling

In recent years, a new breed of non-uniform memory access (NUMA) systems has emerged: multi-socket servers of multicores. This paper makes the case that data management systems need to employ designs that take into consideration the characteristics of modern NUMA hardware. To prove our point, we focus on a primitive that is used as the building block of numerous data management operations: data...

متن کامل

Memory and Thread Management on NUMA Systems

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Thread and Memory Placement on NUMA Systems: Asymmetry Matters

نویسندگان

چکیده

منابع مشابه

Exploring Thread and Memory Placement on NUMA Architectures: Solaris and Linux, UltraSPARC/FirePlane and Opteron/HyperTransport

Performance Models for Electronic Structure Methods on Modern Computer Architectures

Memory Latency Reduction with Fine-grain Migrating Threads in Numa Shared-memory Multiprocessors

NUMA-aware algorithms: the case of data shuffling

Memory and Thread Management on NUMA Systems

عنوان ژورنال:

اشتراک گذاری